71 research outputs found

    LegoDB: customizing relational storage for XML documents

    Get PDF
    Journal ArticleXML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). Not only is an increasing amount of XML data now being processed, but XML is also increasingly being used in business-critical applications. Efficient and reliable storage is an important requirement for these applications. By relying on relational engines for this purpose, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors

    Bridging the XML-relational divide with LegoDB: a demonstration

    Get PDF
    Journal ArticleWe present LegoDB, a cost-based XML storage mapping engine that automatically explores a space of possible XML-to-relational mappings and selects an efficient mapping for a given application

    From XML schema to relations: a cost-based approach to XML storage

    Get PDF
    Journal ArticleAs Web applications manipulate an increasing amount of XML, there is a growing interest in storing XML data in relational databases. Due to the mismatch between the complexity of XML's tree structure and the simplicity of flat relational tables, there are many ways to store the same document in an RDBMS, and a number of heuristic techniques have been proposed. These techniques typically define fixed mappings and do not take application characteristics into account. However, a fixed mapping is unlikely to work well for all possible applications. In contrast, LegoDB is a cost-based XML storage mapping engine that explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this paper, we describe the LegoDB storage engine and provide experimental results that demonstrate the effectiveness of this approach

    Querying xml with update syntax

    Get PDF
    db This paper investigates a class of transform queries proposed by XQuery Update [6]. A transform query is defined in terms of XML update syntax. When posed on an XML tree T, it returns another XML tree that would be produced by executing its embedded update on T, without destructive impact on T. Transform queries support a variety of applications including XML hypothetical queries, the simulation of updates on virtual views, and the enforcement of XML access control. In light of the wide-range of applications for transform queries, we develop automaton-based techniques for efficiently evaluating transform queries and for computing their compositions with user queries in standard XQuery. We provide (a) three algorithms to implement transform queries without change to existing XQuery processors, (b) a linear-time algorithm, based on a seamless integration of automaton execution and SAX parsing, to evaluate transform queries on large XML documents that are difficult to handle by existing XQuery engines, and (c) an algorithm to rewrite the composition of user queries and transform queries into a single efficient query in standard XQuery. We also present experimental results comparing the efficiency of our evaluation and composition algorithms for transform queries

    Incremental Evaluation of Schema-directed XML Publishing

    Get PDF

    Conditional Functional Dependencies for Data Cleaning

    Get PDF
    We propose a class of constraints, referred to as conditional functional dependencies (CFDs), and study their applications in data cleaning. In contrast to traditional functional dependencies (FDs) that were developed mainly for schema design, CFDs aim at capturing the consistency of data by incorporating bindings of semantically related values. For CFDs we provide an inference system analogous to Armstrong’s axioms for FDs, as well as consistency analysis. Since CFDs allow data bindings, a large number of individual constraints may hold on a table, complicating detection of constraint violations. We develop techniques for detecting CFD violations in SQL as well as novel techniques for checking multiple constraints in a single query. We experimentally evaluate the performance of our CFD-based methods for inconsistency detection. This not only yields a constraint theory for CFDs butisalsoasteptowardapractical constraint-based method for improving data quality.
    • …
    corecore